Semi-automatic language model acquisition without large corpora
نویسندگان
چکیده
Statistical language models have gained a reputation as providing the overall performance for speech recognition, and so widely used in speech recognition systems today. The tasks to which statistical language models can be applied are, however, limited, because a large corpus is essential for the building of a statistical model, and the collection of a new corpus is a very costly task in terms of time and e ort. Thus, if our aim is to apply speech recognition to various tasks as required, we need a way of developing a new language model for a given task at a reasonable cost.
منابع مشابه
Semi-automatic acquisition of domain-specific semantic structures
This paper describes a methodology for semi-automatic grammar induction from unannotated corpora belonging to a restricted domain. The grammar contains both semantic and syntactic structures, which are conducive towards language understanding. Our work aims to ameliorate the reliance of grammar development on expert handcrafting or the availability of annotated corpora. To strive for a reasonab...
متن کاملLexical Knowledge Acquisition from Corpora
The paper presents a computational environment to support developing a lexicon for natural language processing. The underlying idea of the environment is to utilize up-to-date language technologies to minimize both the human labor and the inconsistency that are unavoidable in manual compilation of a lexicon. The proposed computational environment enables an efcient construction of a consistent ...
متن کاملTowards a Workbench for Acquisition of Domain Knowledge from Natural Language
In this paper we describe an architecture and functionality of main components of a workbench for an acquisition of domain knowledge from large text corpora. The workbench supports an incremental process of corpus analysis starting from a rough automatic extraction and organization of lexico-semantic regularities and ending with a computer supported analysis of extracted data and a semi-automat...
متن کاملContextual Meta-Knowledge Acquisition from Corpora
This paper looks at the area of automatic acquisition of meta-knowledge for the structuring of very large knowledge bases-(VLKB). It is argued that we will rediscover the need in Natural Language Processing (NLP) for such large knowledge bases and that one possible method for structuring them eeciently lies in association-based statistics gathered from corpora. The discussion sets out the aims ...
متن کاملDeriving an Lfg from a Treebank Resource
High quality training corpora are crucial for statistical approaches to natural language processing. For probabilistic Lexical Functional Grammars (LFG-DOP, (Bod R. & Kaplan R. 1998)) significant corpora of texts associated with both c-structure and f-structure representations are required. This poses an important acquisition problem: manual construction is time-consuming and errorprone while s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000